3 research outputs found

    Generalised Pattern Matching Revisited

    Get PDF
    In the problem of Generalised Pattern Matching (GPM)\texttt{Generalised Pattern Matching}\ (\texttt{GPM}) [STOC'94, Muthukrishnan and Palem], we are given a text TT of length nn over an alphabet ΣT\Sigma_T, a pattern PP of length mm over an alphabet ΣP\Sigma_P, and a matching relationship ⊆ΣT×ΣP\subseteq \Sigma_T \times \Sigma_P, and must return all substrings of TT that match PP (reporting) or the number of mismatches between each substring of TT of length mm and PP (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * D \mathcal{D}\, being the maximum number of characters that match a fixed character, * S \mathcal{S}\, being the number of pairs of matching characters, * I \mathcal{I}\, being the total number of disjoint intervals of characters that match the mm characters of the pattern PP. At the heart of our new deterministic upper bounds for D \mathcal{D}\, and S \mathcal{S}\, lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for GPM\texttt{GPM}. We start by showing that any deterministic or Monte Carlo algorithm for GPM\texttt{GPM} must use Ω(S)\Omega(\mathcal{S}) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed

    Counting 4-Patterns in Permutations Is Equivalent to Counting 4-Cycles in Graphs

    Get PDF
    Permutation ? appears in permutation ? if there exists a subsequence of ? that is order-isomorphic to ?. The natural algorithmic question is to check if ? appears in ?, and if so count the number of occurrences. Only since very recently we know that for any fixed length k, we can check if a given pattern of length k appears in a permutation of length n in time linear in n, but being able to count all such occurrences in f(k)? n^o(k/log k) time would refute the exponential time hypothesis (ETH). Together with practical applications in statistics, this motivates a systematic study of the complexity of counting occurrences for different patterns of fixed small length k. We investigate this question for k = 4. Very recently, Even-Zohar and Leng [arXiv 2019] identified two types of 4-patterns. For the first type they designed an ??(n) time algorithm, while for the second they were able to provide an ??(n^1.5) time algorithm. This brings up the question whether the permutations of the second type are inherently harder than the first type. We establish a connection between counting 4-patterns of the second type and counting 4-cycles (not necessarily induced) in a sparse undirected graph. By designing two-way reductions we show that the complexities of both problems are the same, up to polylogarithmic factors. This allows us to leverage the work done on the latter to provide a reasonable argument for why there is a difference in the complexities for counting 4-patterns of the first and the second type. In particular, even for the seemingly simpler problem of detecting a 4-cycle in a graph on m edges, the best known algorithm works in ?(m^{4/3}) time. Our reductions imply that an ?(n^{4/3-?}) time algorithm for counting occurrences of any 4-pattern of the second type in a permutation of length n would imply an exciting breakthrough for counting (and hence also detecting) 4-cycles. In the other direction, by plugging in the fastest known algorithm for counting 4-cycles, we obtain an algorithm for counting occurrences of any 4-pattern of the second type in ?(n^1.48) time

    Optimal Near-Linear Space Heaviest Induced Ancestors

    Get PDF
    corecore